Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace cgi.FieldStorage with multipart #466

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

luhn
Copy link
Contributor

@luhn luhn commented Sep 17, 2024

Gets rid of the deprecation warnings and resolves #437

This replaces dependencies on the stdlib cgi for the multipart package, as recommended by PEP-0594

Unfortunately MultiDict exposes FieldStorage objects publicly, and since that's no longer an option I created a new MultiDictFile object that has the same shape. Beyond that, though, FieldStorage is only an implementation detail and can be swapped out cleanly.

There's one test now failing: test_cgi_escaping_fix. It looks to be special handling of a malformed multipart doc, but I can't figure out the motivation for it. It was originally added in d57f294 and the linked repository no longer exists, so the context is lost.

As an aside: FieldStorage is an extremely weird API. The class represents both a single multipart part and the full multipart document.

@mmerickel
Copy link
Member

Do you have a comment to throw into the mix here about the pros/cons of this versus using https://pypi.org/project/legacy-cgi/ which arguably would be more bw-compat?

For example, I have a thought we should ship a version that uses legacy-cgi to unblock folks potentially short-term, and then in a newer version move to multipart which would have likely some clear bw-incompat changes. But I'm typing all of this without being aware of if there are bw-incompat changes in this PR. I don't see those listed out so maybe I'm asking for that.

@luhn
Copy link
Contributor Author

luhn commented Oct 7, 2024

There shouldn't be any major bw compat issues, I took care to keep the same behavior and interface. The only differences I'm aware of:

  • file objects are now of type webob.multidict.MultiDictFile instead of cgi.FieldStorage, but the properties should be exactly the same. AFAIA this is the only place where webob exposed cgi directly to the user.
  • The broken multipart data in test_cgi_escaping_fix is no longer accepted input. multipart readme says "No support for clearly broken input (e.g. invalid line breaks or header names)", so if it's important to keep that behavior we'll need to look elsewhere.

legacy-cgi could be a good short-term fix, maybe get a patch version out soon that adds the dependency for 3.13+ while this PR is being discussed. But I believe multipart is the better long-term solution:

  • It's recommended by the PEP.
  • cgi was already a hacky solution (webob.compat.FieldStorage), multipart does not have the same problems.
  • cgi.FieldStorage is an extremely wonky API, multipart is much more reasonable.
  • legacy-cgi will receive no updates except for bugfixes, so any issues with the behavior (once again, webob.compat.FieldStorage) are permanent.

@luhn
Copy link
Contributor Author

luhn commented Oct 8, 2024

@mgorny pointed out that there's conflicting multipart packages.

It seems like python-multipart is primarily used by Starlette and there are plans to bring the package into the main codebase, in which case the problem goes away.

Another option is vendoring multipart. It's a single file, MIT license.

@mmerickel
Copy link
Member

I would be a fan of vendoring it if the test suite impact for webob isn’t too bad. Either way this doesn’t really feel like our problem.

@luhn
Copy link
Contributor Author

luhn commented Oct 8, 2024

I'll give a shot at vendoring it. First time doing something like this, any best practices I should be aware of?

@mmerickel
Copy link
Member

waitress vendors asyncore as an example.

@luhn
Copy link
Contributor Author

luhn commented Oct 10, 2024

Small snag: multipart has since released a v1, which is a significantly stricter parser and only accepts CRLF newlines. I've had to update a few tests that this broke.

The author claims this should not affect real-world usecases.

Input that contains LF instead of CRLF is clearly broken and should be rejected, even in non-strict mode. LF alone was never valid as far as I know, all RFCs I could find (including RFC 2046 from 1996) mandate CRLF. Maybe a decade ago there were broken e-mail clients producing such output, and early multipart parsers (for email) had to support it, but I do not know of any relevant browser or HTTP client that does this today. So, removing those unit-tests should be fine, there should be no real world impact.

defnull/multipart#55 (comment)

Additionally, we have another backwards incompatibility to add to the list: Fields with no name have their name set to an empty string. (test_none_field_name)

@defnull
Copy link

defnull commented Oct 11, 2024

Additionally, we have another backwards incompatibility to add to the list: Fields with no name have their name set to an empty string. (test_none_field_name)

The name parameter is mandatory (RFC-7578 4.2) for Content-Disposition headers, but can be empty. Not sure about the behavior of cgi.FieldStorage but in multipart, a missing name option triggers an error in strict mode, or is replaced by an empty string in non-strict mode. If you need to differentiate between an empty or a missing name parameter to simulate old behavior, you could check MultipartPart.disposition or MultipartPart.headers.get("Content-Disposition") afterwards and set MultipartPart.name to None if needed.

@luhn
Copy link
Contributor Author

luhn commented Oct 17, 2024

Hey @defnull, thanks for popping in!

Good to know that name is mandatory per the RFCs. Since this PR seems like it will be part of the 2.0 release (if it gets merged), I think it makes sense to have a minor break so we can be in-line with both the standards and multipart's functionality.

@luhn
Copy link
Contributor Author

luhn commented Oct 17, 2024

@mmerickel I've vendored multipart and all its tests. 100% coverage. Excluded the vendored files from linting to remain as close as possible to the original source.

@defnull
Copy link

defnull commented Oct 17, 2024

Once (or if) Kludex/python-multipart#166 gets merged, you can de-vendor again.

@valeriupredoi
Copy link

hi folks, just a heads up there is a feedstock PR by the regro-bot to update the webob version to 1.8.9 - it'd be great to have this PR in 1.8.9 so we can start using webob with Python 3.13. Cheers muchly for the work 🍺

@mmerickel
Copy link
Member

1.8.9 was released yesterday with support for 3.13 via legacy-cgi dependency. We will still evaluate this PR as part of the larger 2.0 release we’re working on.

@valeriupredoi
Copy link

@mmerickel cheers, mate! Then I'll head over to the feedstock see if I can expedite the conda package deployment for 1.8.9 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13
4 participants